Crossing Sentence Boundaries in Statistical Machine Translation
نویسنده
چکیده
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملExtending a probabilistic phrase alignment approach for SMT
Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over th...
متن کاملRule-based Reordering Constraints for Phrase-based SMT
Translation results suffer when a standard phrase-based statistical machine translation system is used for translating long sentences. The translation output will not preserve the same word order as the source, especially between a language pair that has different syntactic structures. When a sentence is long, it should be partitioned into several clauses, and the word reordering during the tra...
متن کاملEvaluating machine translation output with automatic sentence segmentation
This paper presents a novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries. The algorithm can process translation hypotheses with segment boundaries which do not correspond to the reference segment boundaries, or a completely unsegmented text stream. Thus, the method is especially useful for evaluating translations of...
متن کاملImproving speech translation with automatic boundary prediction
This paper investigates the influence of automatic sentence boundary and sub-sentence punctuation prediction on machine translation (MT) of automatically recognized speech. We use prosodic and lexical cues to determine sentence boundaries, and successfully combine two complementary approaches to sentence boundary prediction. We also introduce a new feature for segmentation prediction that direc...
متن کامل